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Heterochromatin is the enigmatic 
eukaryotic genome compartment 
found mostly at telomeres and cen- 
tromeres. Conventional approaches to 
sequence assembly and genetic manipu- 
lation fail in this highly repetitive, gene- 
sparse, and recombinationally silent 
DNA. In contrast, genetic and molecular 
analyses of euchromatin-encoded pro- 
teins that bind, remodel, and propagate 
heterochromatin have revealed its vital 
role in numerous cellular and evolution- 
ary processes. Utilizing the 12 sequenced 
Drosophila genomes, Levine et al.' took a 
phylogenomic approach to discover new 
such protein "surrogates" of heterochro- 
matin function and evolution. This paper 
reported over 20 new members of what 
was traditionally believed to be a small 
and static Heterochromatin Protein 1 
(HPl) gene family. The newly identified 
HPl proteins are structurally diverse, 
lineage-restricted, and expressed primar- 
ily in the male germline. The birth and 
death of HPl genes follows a "revolving 
door" pattern, where new HPls appear 
to replace old HPls. Here, we address 
alternative evolutionary models that 
drive this constant innovation. 

HP1 : A Flashlight for the Dark 
Matter of Eukaryotic Genomes 

The human genome sequence is not com- 
plete.^ Neither are the Mus musculus^ 
Drosophila melanogaste/ nor Arabidopsis 
thaliancv' genomes. Indeed, "complete 
genome sequence" assemblies from many 
eukaryotes may be missing up to 30% of 
nuclear-encoded DNA. The unassembled 
genome compartment is mostly com- 
prised of heterochromatin — packed with 
satellites and transposable elements, but 



relatively sparse in protein-coding genes 
and recombination events. The challenges 
of assembling repetitive DNA and genetic 
mapping to regions of low recombination, 
combined with a mistaken perception that 
heterochromatin harbors few functional 
elements or no genes, has contributed to 
decades of scientific neglect. With the 
advent of new sequencing technologies 
and painstaking efforts of large consor- 
tia of researchers (e.g., the Drosophila 
Heterochromatin Genome Project''), this 
slight is being slowly corrected with signif- 
icant advances in heterochromatin func- 
tion and evolution. 

Despite renewed interest in the study 
of heterochromatin DNA sequence, most 
insights into heterochromatin function 
have emerged from genetic and molecular 
studies of euchromatin-encoded proteins 
that affect heterochromatin properties, 
especially the expression of proximal genes 
in the phenomenon of position-effect-var- 
iegation (PEV).'' These and subsequent 
studies led to the awareness that hetero- 
chromatin participates in many essential 
cellular processes, including chromosome 
segregation,^ genome defense,^ and gene 
regulation,' transforming the scientific 
community from disinterest to broad 
appreciation of heterochromatin's bio- 
logical significance. The resulting picture 
revealed that rather than representing a 
sea of functionally uninteresting homo- 
geneous repeats, heterochromatin con- 
tains many disparate elements of varied 
functions (e.g., piwi-associated RNA, or 
piRNA, clusters for genome defense**). 

Our understanding of heterochroma- 
tin function was transformed by a pivotal 
1986 publication describing D. melanogas- 
ters Heterochromatin Protein j_ (HPl).'" 
Using monoclonal antibodies against a 
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protein fraction tightly bound to DNA, 
James and Elgin uncovered a chromosomal 
protein that localizes primarily to pericen- 
tric heterochromatin and was encoded 
by Su(var)2-5" one of the genes previ- 
ously shown to suppress PEV. Subsequent 
analysis of HPl (now called "HPIA") 
and its numerous interacting partners 
illuminated an unprecedented number of 
heterochromatin-dependent essential pro- 
cesses. Centromere maintenance,'^ RNA 
interference,'^ and telomere protection,''' 
for example, all rely on HPl-dependent 
heterochromatin integrity. This progress 
highlights the power of heterochroma- 
tin-bound proteins as molecular tools to 
reveal new roles for this ubiquitous but 
cryptic genome compartment. 

More recently, these heterochroma- 
tin-bound proteins have been used to 
reveal the evolutionary forces that may 
act on the rapidly evolving but unas- 
sembled, sometimes undefined hetero- 
chromatic sequence to which they bind." 
For example, the finding that a female 
germline-restricted HPl protein. Rhino/ 
HPID, had evolved under positive selec- 
tion (faster than expected amino acid 
divergence) predicted its engagement in 
a moleculr arms-race with transposable 
elements."' This prediction was supported 
by the finding that Rhino binds het- 
erochromatin-embedded, rapidly evolv- 
ing piRNA clusters,'^ which themselves 
likely evolve under positive selection"* to 
"immunize" genomes against new inva- 
sions of transposable elements. Thus, rig- 
orous population genetic and molecular 
evolution analyses on heterochromatin 
protein "surrogates" could reveal evolu- 
tionary dynamics at the repetitive hetero- 
chromatin sequence for which such tests 
are undeveloped. 

HP1 Phylogenomics: Expanding 
the Family Business 

We set out to discover new "heterochro- 
matin surrogates" for both functional 
and evolutionary analysis. We focused 
our analysis on the gene family founded 
by HPIA.' At first glance, the HPl gene 
family seems like a poor target for phy- 
logenomics. For many years, it was con- 

a We refer to the Su{var)2-5 gene as HPIA here purely 
for ease of referral and comparison to the other paral- 
ogous HPl genes in Drosophila. 



sidered small, static, and structurally 
homogenous." HPl family members are 
traditionally defined by a single, common 
domain structure — a chromodomain, 
hinge, and chromoshadow domain^" (Fig. 
1). Using this definition, HPIB and HPIC 
were discovered in the newly sequenced 
D. melanogaster genome in 2000,^' bring- 
ing the family size up to three. Additional 
HPls in non-Drosophila genomes were 
also identified, including three in humans 
{HPla, HPl^, HPly),^^-^* which are not 
orthologous to any of the Drosophila HPl 
genes^' and instead likely derived from an 
HPlB-like ancestor.^' 

At this time, a family size of three was 
the maximum number across any eukary- 
ote, including yeast, worms, mouse, and 
ArabidopsisP Furthermore, all three 
Drosophila HPls and all three mammalian 
HPls are highly conserved across broad 
evolutionary distances. Remarkably, the 
human HPla can rescue D. melanogaster 
HPlA-dependent loss of silencing^^ despite 
paralogy. These observations supported 
the idea that functional and evolutionary 
stasis is a defining feature of this small 
family. However, the fortuitous discovery 
of a new D. melanogaster HPl in a female 
sterility screen,^^ called '"rhino," together 
with its signature of strong positive selec- 
tion,"" suggested that HPls are potentially 
more numerous and more plastic than 
originally thought. The subsequent dis- 
covery of a fifth HPP'' in D. melanogas- 
ter, HPIE, supported this prediction and 
motivated our comprehensive phyloge- 
nomic analysis. 

HPIA through HPIE served as entry 
points into characterizing the HPl gene 
family in the recently sequenced 12 
Drosophila genomes, which represent over 
40 million years of Drosophila evolution.^' 
Using all five chromodomains and chro- 
moshadow domains as queries, we com- 
putationally searched the 12 genomes for 
significant tBLASTn hits. Our iterative 
search strategy, in which any significant 
domain hit becomes a search query itself 
returned over 100 hits across this 40 mil- 
lion year snapshot. Using the evolutionary 
definition of "gene family," we condi- 
tioned membership of these hits into the 
HPl family based on phylogenetic rela- 
tionships. Combined with information 
about synteny, we identified orthologous, 



paralogous, and anciently diverged mem- 
bers relative to known HPls. Orthologs 
have diverged through speciation events; 
thus, they are both syntenic (in the same 
genomic location) and more closely related 
to each other than to other HPl genes. 
For example, the HPIA alleles from all 12 
genomes form a single monophyletic (sin- 
gle evolutionary origin) clade. Paralogs 
represent sister clades; for example, the 
HPIG and HPIA gene clades share a com- 
mon ancestor that is more than 40 million 
years old. We also identified younger par- 
alogs that cluster within older HPls, like 
the relatively young, HPlD/Rhino-deiiyed 
Oxpecker genes. These represent cases 
of more recent gene duplication events 
within pre-existing HPl clades. Finally, 
a few paralogs defied groupings within 
old or new HPl gene clades, representing 
either ancient origins or very rapid diver- 
gence (they still share stronger identity to 
HPl chromo or shadow domains than to 
any other Drosophila proteins). 

We found that HPIA, HPIB, HPIC 
and HPID orthologs occurred in all 
12 genomes in the syntenic locations. 
However, HPIE orthologs had clearly 
degenerated in several species, implicating 
recurrent HPIE gene loss. This seemingly 
unique species-specificity of HPIE proved 
to be the rule for all new HPl paralogs 
we discovered. Indeed, none of the other 
HPl genes discovered are present in all 
12 genomes. Virtually all of these newly 
identified HPls evolved within the last 
40 million years and so appear in only a 
restricted set of lineages. 

Even more unexpectedly, while the 
canonical HPl domain structure is 
defined by the presence of both a chro- 
modomain and a chromoshadow domain 
(Fig. lA), the majority of new HPl fam- 
ily members encoded only one of these 
domains, having lost or degenerated 
either the original chromo or chromo- 
shadow domain during or after duplica- 
tion (Fig. IB). We initially disregarded 
these "half-Z/P/x" as duplicate genes 
caught in the act of pseudogenization. 
However, upon examining the syntenic 
locations of these \i3\^-HPls, we con- 
firmed that many had been retained for 
many millions of years. Long-term reten- 
tion is consistent with function, particu- 
larly in Drosophila where the half-life 
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Figure 2. Revolving door dynamics: gene number stasis, recurrent birth, and recurrent death. The 
10 chromoshadow-only genes represented are all expressed primarily in testis. Each lineage har- 
bors either two or three HP1 s of this domain class, but these genes are rarely shared across distant 
lineages. • Expression assayed directly by tissue-restricted RTPCR. • Expression inferred. 
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Figure 1. (A) Canonical HPI domain structure. The chromodomain ("chromo") recognizes 
H3K9me, the hinge binds DNA and/or RNA, and the chromoshadow ("shadow") homodimerizes 
and heterodimerizes. (B) Alternative paths underlying half-HPI birth. Drift or selection drives the 
degeneration of the chromodomain (in this example) following a full HPI duplication event. Al- 
ternatively, the duplication itself is restricted to a single domain. A 3' retrotransposition bias" may 
underlie the enrichment of chromoshadow domain-only HPI s observed in our data set.' 
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of pseudogenes is remarkably short.'" 
Further supporting this prediction, we 
found evidence of transcription for 18 of 
the 19 genes in our Ust. Intriguingiy, vir- 
tually all are transcribed primarily in the 
male germline. Four genes are encoded in 
the well-annotated genome of D. melano- 
gaster — one chromodomain-only HPI, 
and three chromoshadow-only HPls (one 
of which we already know is essentiaf"). 

Our data are consistent with a mini- 
mum HPI gene family size of 26 in the 
12 Drosophila species sampled. This rep- 
resents a 4-fold increase in HPI gene 
number. Furthermore, the unprecedented 
structural diversity of the new HPI family 
members offers a dramatically expanded 
toolkit for discovering new heterochro- 
matin functions.'^ The pervasive lineage- 
restriction is consistent with species- or 
clade-specific adaptations that rely on 
young HPI genes (below), and may offer 
insights into the evolutionary significance 
of the rampant between-species diver- 
gence observed at the heterochromatin 
sequence itself Moreover, the predomi- 
nance of male germline expression is 
consistent with currently uncharacterized 
male-specific chromosome biology driv- 
ing this lineage-specific adaptation. 

A 'Revolving Door' of HP1 
Proteins in the Drosophila IVIale 
Germline 

The male germline has recurrently 
emerged as a venue enriched for signa- 
tures of positive selection across many 
gene classes.'^ These DNA signatures 
include statistical enrichment for new 
amino acid changes and retention of 
gene duplicates, the latter of which 
results in expansion of gene families. 
An agnostic analysis of Drosophila gene 
family evolution across the 12 genomes'' 
demonstrated that significant fam- 
ily expansions (and contractions) are 
enriched for male reproduction-related 
functions. Intriguingiy, the abundant 
male germline-expressed HPI paralogs 
are not part of gene family expansion in 
the strict sense. Despite rampant gene 
birth and death over the 40 million 

b The HPI family designation is an evolutionary classi- 
fication; significant functional work needs to be done 
to ascertain whethet any or all the new genes indeed 
encode hetetochromatin-binding proteins. 



years, the number of HPI genes in a 
given Drosophila species varies only mod- 
estly (for the chromoshadow-only class, 
see Figure 2). This gene number stasis, 
despite prolific gene birth, is consistent 
with functional gene replacements over 
time. Borrowing a term from Demuth 
and Hahn,''' we refer to this pattern as an 
HPI gene family "revolving door!' 



At a genome-wide scale, recurrent 
gene turnover is consistent with a neu- 
tral model of gene family evolution.'^ 
Averaged across all genes, a steady- 
state birth/death process (that assumes 
an equilibrium genome size) readily 
accounts for gene turnover. Under this 
model, a gene duplication event generates 
a daughter copy that is ultimately retained 



www.landesbioscience.com 



Fly 



139 



A HPi 

duplication 



drift-* 

degeneration 

•-Illlli-~lilllllllh 



drift^ 
fixation 



B 



HP! 
duplication 



deleterious dosage 
-♦ degeneration 



positive selection 
-» fixation 



"'illll'" 



or 



drift -» 
fixation 



HPI 
du plicat ion 



selfish element 
birth/fixation 



Illllllf-lllllllllh 

HPI 
degeneration 



HPI suppressor 
evolution/fixation 



iiiiiiiiiiiiiiiJiiiiiiili 

selfish element 
degeneration 



Figure 3. Alternative forces driving gene replacements. (A) Birth and then fixation/death under neutral forces. (B) Birth and then fixation under neutral 
forces or positive selection, death driven by negative selection to relieve dosage effects, such as heterochromatin expansion/contraction or positive/ 
negative transcriptional regulation.""""" (C) Birth and then fixation under positive selection to suppress recurrently evolving selfish elements, death 
under neutral forces, p, parent gene, d, daughter gene. 



while the parent copy accumulates muta- 
tions under genetic drift (Fig. 3A). At the 
level of a single gene family with elevated 
birth/death rates, however, this model is 
less satisfying; specifically, under neutral- 
ity the gene death rate varies indepen- 
dently of gene copy number. Chromatin 
proteins, and specifically HPls, however, 
are typically dosage-sensitive.^^'' An extra 
gene copy can, for example, suppress 
or enhance heterochromatin spreading 
along a chromosome.'^ We speculate that 
gene duplications of some chromatin- 
protein encoding loci are instantly vis- 
ible to natural selection. Consequently, 
an HPI gene death rate parameter may 
vary positively with gene copy num- 
ber, which at least partially explains the 
gene family-wide revolving door pattern 
(Fig. 3B). This slight variation on 
Birchler's "gene balance hypothesis"'** 
may also explain our observation that 
\\a\{-HPls evolve exclusively from full 
HPls. A mutation that breaks a chro- 
modomain or chromoshadow domain in 
the full HPl's daughter copy instantly 



relieves deleterious dosage-effects. Relief 
from deleterious dosage effects may free 
up the daughter copy to evolve along its 
own evolutionary trajectory. 

In addition to this negative selection, 
positive selection may also explain the 
recurrent gene birth and death across the 
HPI family (Fig. 3C). Heterochromatin 
is riddled with selfish elements."''"' These 
genomic parasites gain a fitness advan- 
tage upon self-replication or drive in the 
germline where they have direct access to 
the next generation. Germline-restricted 
HPls like the numerous Rhino/HPID- 
derived chromodomain-only Oxpecker 
genes may suppress this selfish activity. 
Once successfully silenced, the selfish 
element and its suppressor degenerate. 
Recurrent bouts of selfish element birth 
and degeneration*' may explain at least 
some of this HPI turnover in the male 
germline. 

Just like the discovery and study of 
histone variants have greatly transformed 
our understanding of chromatin functions 
and states,''^ analysis of this diverse toolkit 



of heterochromatin surrogates promises 
to reveal both currently unknown cel- 
lular roles for heterochromatin as well as 
the evolutionary forces that act on this 
still understudied genome compartment. 
Moreover, this kind of phylogenomic 
approach is gene family- and taxon-inde- 
pendent. As more and more complete 
genome sequencing data sets become 
available, we anticipate many more analy- 
ses that overturn false perceptions of sta- 
sis at gene families that encode essential 
proteins. 
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